Skip to content

Add conservative_2d: conservative regrid for grids that aren't 1D-separable#70

Draft
thodson-usgs wants to merge 8 commits intoxarray-contrib:mainfrom
thodson-usgs:conservative
Draft

Add conservative_2d: conservative regrid for grids that aren't 1D-separable#70
thodson-usgs wants to merge 8 commits intoxarray-contrib:mainfrom
thodson-usgs:conservative

Conversation

@thodson-usgs
Copy link
Copy Markdown

@thodson-usgs thodson-usgs commented Apr 20, 2026

Summary

Adds conservative regridding for grids that aren't 1D-separable — a capability the existing .conservative method cannot express:

  • Curvilinear grids — accepts 2D lat[i, j] / lon[i, j] coordinate variables (ORCA, tripolar, rotated regional).
  • Unstructured meshesConservativeRegridder.from_polygons(...) accepts arbitrary shapely polygon arrays (MPAS, ICON, finite-element).
  • Grid → polygon aggregation — regrid structured data onto country shapes or watershed boundaries, same from_polygons path. Auxiliary target coordinates (region_id, labels, scalar metadata) ride along with the output.
  • Correct spherical areas on lat/lon — opt-in spherical=True kwarg applies an analytic Lambert cylindrical equal-area projection of cell edges before intersecting. Matches the factored path's sin-weighted accuracy at the same fast-path cost.
  • Antimeridian support — structured lat/lon grids whose longitude spans the dateline are unwrapped automatically. For polygon input, from_polygons(..., periodic=True) unwraps polygon rings that cross ±180°.

Internals:

  • ConservativeRegridder class — reusable regridder; builds the sparse area-intersection matrix once, applies to many fields. .T / .transpose() returns the backward regridder sharing cached state.
  • .to_netcdf / .from_netcdf — persist the weight matrix plus reproducibility metadata (dim names, shapes, grid ranges, spherical flag, xarray-regrid version, timestamp, schema version) in root attributes, with source / target coord-only groups alongside. Schema-versioned for forward compat.
  • Accessor — exposed via .regrid.conservative_2d(...) on xarray DataArrays / Datasets.
  • Optional extrapip install xarray-regrid[conservative-2d] pulls shapely>=2.0 + h5netcdf. Existing users unaffected.

Motivation

.conservative uses Stephan Hoyer's axis-factored 1D overlap — fast and elegant but strictly rectilinear (1D-separable). Users with curvilinear ocean models, unstructured climate meshes, or grid-to-region aggregation currently reach for xESMF, which:

  • Requires ESMPy (C/Fortran via conda-forge only; no Windows support)
  • Doesn't support dask.distributed
  • Is a heavy dep cascade

.conservative_2d fills the gap with pure shapely + sparse. Works on Windows, composes with distributed dask, installs with a single pip install shapely.

Naming follows xESMF's short-method-name convention. "_2d" distinguishes full 2D cell-polygon intersection from the 1D axis-factored approach and doesn't prejudge whether the grid is curvilinear or unstructured.

Design notes

  • Rectilinear fast path: when both grids are axis-aligned rectangles, skip GEOS clipping entirely and compute intersection areas analytically from the bounds. ~30× faster than the GEOS path at 1M cells.
  • STRtree candidate filter uses bounding-box overlap, not the GEOS intersects predicate, by default — much cheaper and the subsequent area > 0 filter drops any false positives. from_polygons(..., predicate_filter=True) restores the predicate filter for pathological inputs (thin diagonals with loose bboxes).
  • Threaded GEOS: when curvilinear, shapely 2's GIL-free shapely.intersection over candidate pairs is parallelized across a ThreadPoolExecutor. ~3-4× on 8 cores.
  • Sparse weight matrix: stored as sparse.COO; apply goes through sparse.matmul inside xr.apply_ufunc(dask=\"parallelized\"). Cached pre-sorted transpose avoids a per-call re-sort.
  • NaN semantics match existing .conservative: two-pass value + mask matmul, nan_threshold with the same interpretation.
  • Coverage mask applied regardless of skipna so uncovered target cells (domain boundaries, polygon holes) always produce NaN.
  • Dtype preserved: float32 in → float32 out. sparse matmul promotes to float64 internally; cast back in _apply_core.

Test plan

  • Structured rectilinear: conservative_2d exactly reproduces .conservative (planar) to 1e-12.
  • Dask + chunked input: identical results to non-dask; spatial chunks rechunked automatically.
  • Curvilinear target (rotated 2D): mass conservation at 5.24e-16 rel.
  • Unstructured mesh: 50k-cell Voronoi round-trip at 1e-15 rel.
  • NaN handling: strict vs lenient thresholds produce the expected ordering.
  • .T transpose on rectilinear and curvilinear; constant field round-trip exact on aligned grids.
  • Spherical mode: matches factored path on lat/lon grids to within grid quadrature.
  • Antimeridian: constant field through a dateline-crossing structured grid is preserved exactly; from_polygons(periodic=True) correctly routes samples across the seam.
  • Auxiliary target coords (non-dim + scalar) attached on from_polygons output.
  • netCDF save/load: bit-identical output on reload; schema mismatch rejected.
  • Dtype preservation: float32/64 roundtrip as expected.
  • Empty target / polygon holes → NaN (regression-guarded).
  • 33-test test_conservative_2d.py module passing on this branch.

Docs

Three demo notebooks under docs/notebooks/demos/:

  • demo_conservative_2d_curvilinear.ipynb — rotated 2D target.
  • demo_conservative_2d_regions.ipynb — structured-grid → region-polygon aggregation.
  • demo_conservative_2d_unstructured.ipynb — Voronoi mesh via from_polygons + .to_netcdf / .from_netcdf.

Notebooks are output-stripped — reviewers / docs build execute them fresh.

Follow-ups (out of scope for this PR)

  • Great-circle (s2) geometry: WIP on spherely-integration branch (PR WIP: Add s2 manifold to conservative_2d (stacks on #70) #71) — drops in manifold=\"s2\" backed by the optional spherely dependency.
  • Grid-to-region aggregation (like xagg): natural extension of from_polygons but wants a dedicated API surface for loading region polygons.

🤖 Generated with Claude Code

@thodson-usgs thodson-usgs changed the title Add polygon-based conservative regridder Add conservative_2d: conservative regrid for grids that aren't 1D-separable Apr 20, 2026
@BSchilperoort
Copy link
Copy Markdown
Contributor

Hi Timothy,

Thanks for contributing. Nice to see that it was possible to add regridding for non-rectilinear grids. I had thought before of using shapely and intersecting polygons, however I could not get it to perform well enough (didn't know about STRtree!).

I do think the LLM went a bit wild and the source file seems a bit overengineered and does not fit the syntax/structuring of the rest of the code. This does not make it very easy to review and see the logic flow.

The LLM also thinks that polygons crossing the antimeridian is a "fringe case", I would disagree 😅

@thodson-usgs
Copy link
Copy Markdown
Author

Honestly, I haven't done much yet :), but I thought a draft PR would be a good way to get the ball rolling.

The bot also identified a few other small bugfixes and upstream optimizations that I'm working through. As those progress, I'm hoping we can pair on this PR, though I want to be respectful of your time. Cloud friendly regridding has long been on our wishlist at USGS (and other groups), so we were excited when xarray-regrid appeared.
@kjdoore tested the package for us awhile back. Unfortunately, he went on to better things, but now regridding might be within the capabilities of the current LLMs...with some steering.

Even if it's never merged, I think this will be a useful datapoint on a longstanding problem.

@thodson-usgs
Copy link
Copy Markdown
Author

thodson-usgs commented Apr 21, 2026

I do think the LLM went a bit wild and the source file seems a bit overengineered and does not fit the syntax/structuring of the rest of the code. This does not make it very easy to review and see the logic flow.

Ah, this might be in part because I fed it a Julia implementation for reference (which should be acknowledged). I'll see if the LLM can clean things up, before I make a pass.

@BSchilperoort
Copy link
Copy Markdown
Contributor

BSchilperoort commented Apr 21, 2026

Even if it's never merged, I think this will be a useful datapoint on a longstanding problem.

Yeah it's a good starting point and at a basic level it seems to perform well. So as a source of inspiration it seems promising.

@slevang
Copy link
Copy Markdown
Collaborator

slevang commented Apr 21, 2026

Nice, a more generalized conservative regridding approach that bypasses ESMF would be a great addition. Since xESMF is the current way to achieve this in python, adding some direct comparisons there would be nice as we've done with some of the other methods.

Broadly agree with @BSchilperoort though. Since this package is very lean and focused as is (only ~2600 lines in src), we should be cautious about dumping in a bunch of code without consideration for maintainability.

@thodson-usgs
Copy link
Copy Markdown
Author

Thanks @slevang,
Right, I forgot ambout xESMF. I will investigate it as well. If this particular PR is out of scope, I'm fine closing it. If it leads to improvements in one or both packages, great. I also expect this exercise will identify some simple upstream performance improvements as claude works through and stress tests and profiles different pieces of the stack.

@slevang
Copy link
Copy Markdown
Collaborator

slevang commented Apr 21, 2026

I don't think it is out of scope, and certainly don't mean to discourage you! Only that we should make sure some humans understand the code and agree that it is clean and maintainable before merging.

The current focus of this package is basically "tricks for rectilinear grids" since those can be highly optimized. But I don't see why we can't expand to cover optimizations for other regridding problems, as long as it all stays semi-coherent. For example I've worked on a much more efficient version of sparse point-wise interpolation for chunked data that I could probably add as a feature here.

Adds ConservativeRegridder and the .regrid.conservative_2d accessor for grids
the existing 1D-factored conservative path can't express: curvilinear (2D
lat/lon coords), unstructured meshes, and arbitrary polygon-to-polygon
aggregation. Cell intersections go through shapely with sparse-COO weight
storage, threaded above ~1k pairs, with optional spherical=True for analytic
equal-area weights on lat/lon grids, antimeridian handling, and netCDF
save/load for caching weight matrices. Demo notebooks under
docs/notebooks/demos/ cover curvilinear, unstructured, and grid->regions
aggregation. Optional dependency: install with the conservative-2d extra.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
thodson-usgs and others added 6 commits April 27, 2026 14:07
Source on [0, 360] vs target on [-180, 180] (and reverse) yielded NaN
on half the cells: a uniform shift can't reconcile the two conventions
since the mean diff is exactly 180° and round() banker's-rounds to 0.
Per-value wrap source longitudes into the target's 360° window, mirror-
ing format_lon, and remap area-matrix columns when wrap breaks source
monotonicity. 2D / curvilinear coords keep the existing uniform-shift
fallback. Adds regression test covering both directions, planar and
spherical.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Swaps the synthetic polygon demo for xarray's air_temperature tutorial
dataset aggregated onto US states dissolved from geodatasets.ncovr.
Same workflow end-to-end (from_polygons, map+bar plot, conservation
check, save/reload) but with recognizable inputs.

Adds a [notebooks] extras group covering the runtime deps the demos
need (matplotlib, pooch, geopandas, geodatasets); shapely/h5netcdf/
sparse/dask stay in [conservative-2d] and [accel] so the two can be
combined.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
The map's vmin/vmax span the full source-grid temperature range, but
the bar chart was normalizing to the bar values' own min/max — so a
state at the cool end was deeper blue on the bar than on the map.
Reuse the map's vmin/vmax for the bar.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Each notebook now opens with motivation (what conservative regridding
is, when to use it), what makes the case 1D-non-separable (curvilinear
coords / arbitrary polygons / unstructured mesh), and a numbered summary
of what the notebook will do. Each step's markdown picks up a sentence
or two of "why this step" narrative — what's expensive vs. cheap, what
to look for in the result, what choices reflect domain conventions —
without changing any of the code or expected outputs.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…ooks

The unnormalized ``(n_dst, n_src)`` cell-intersection area matrix was
stored as ``self._areas`` but read externally by tests and the demo
notebooks for conservation diagnostics. Promote to ``self.areas`` as a
plain public attribute (consistent with the existing ``self.spherical``,
``self.x_coord``, ``self.y_coord`` style on the same class) and update
the four external callers. Internal storage is the same — no behavior
or perf change (verified against 3ed200b: build/apply timings within
run-to-run noise on rect, curvilinear, and polygon cases).

Also runs ``ruff format`` (4-line collapses in conservative_2d.py and
test_conservative_2d.py; normalizes notebook source from single-string
to list-of-strings) and clears notebook outputs via nbformat-equivalent
JSON manipulation.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
@thodson-usgs
Copy link
Copy Markdown
Author

For example I've worked on a much more efficient version of sparse point-wise interpolation for chunked data that I could probably add as a feature here.

Are you willing to share that code? Even in a rough form?

I propose that we prompt a bot to integrate your optimizations into a new branch. Then write a prompt to benchmark the performance against xESMF. Then a final prompt to profile the code to explain any performance gaps.

This much should all entail minimal human effort. If the results are interesting, we can take it further.

If you prefer not to share the code, I still encourage you to try it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants